Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Netron output (ONNX JSON) #3717

Merged
merged 31 commits into from
Dec 20, 2024
Merged

Netron output (ONNX JSON) #3717

merged 31 commits into from
Dec 20, 2024

Conversation

CharlieL7
Copy link
Collaborator

@CharlieL7 CharlieL7 commented Dec 16, 2024

  • Adds --netron option to the MIGX driver that outputs a JSON file in the ONNX format that is readable by the Netron application
  • Need to test out a couple different ways of handling base64 encoding
  • Probably want to make issues for each of the TODOs in the code

@CharlieL7 CharlieL7 self-assigned this Dec 16, 2024
@CharlieL7 CharlieL7 requested a review from causten as a code owner December 16, 2024 21:30
@CharlieL7 CharlieL7 marked this pull request as draft December 16, 2024 21:30
@pfultz2
Copy link
Collaborator

pfultz2 commented Dec 16, 2024

Add the base64 dependency to the requirements.txt file. Need to add find_path for it in the CMake to add the include path.

src/netron_output.cpp Outdated Show resolved Hide resolved
src/netron_output.cpp Outdated Show resolved Hide resolved
@CharlieL7 CharlieL7 marked this pull request as ready for review December 18, 2024 21:26
Copy link
Collaborator

@TedThemistokleous TedThemistokleous left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just fix your Tidy Errors

@TedThemistokleous TedThemistokleous added enhancement New feature or request roadmap Tasks to finish for a release labels Dec 18, 2024
@CharlieL7
Copy link
Collaborator Author

CharlieL7 commented Dec 18, 2024

I found a bug with the implementation because of the bit math used that doesn't work with the alterations I made. Going back to using a library instead since I have not found a decently compact snippet that passes CI. Fixed the bug, needed to ignore the clang warning about unsafe buffer access since method directly works on the bytes of the input string.

src/base64.cpp Outdated Show resolved Hide resolved
src/base64.cpp Outdated Show resolved Hide resolved
src/base64.cpp Outdated Show resolved Hide resolved
src/base64.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@lakhinderwalia lakhinderwalia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. Complicated arithmetic shifts.
(Approved).

src/netron_output.cpp Show resolved Hide resolved
src/base64.cpp Outdated Show resolved Hide resolved
std::string b64_encode(const std::vector<byte>& buf)
{
std::size_t len = buf.size();
std::vector<byte> res_vec((len + 2) / 3 * 4, '=');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result should be stored as std::string, there is no reason to prefill it with =, you just push_back the characters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

push_back onto strings of very large lengths can be a performance issue. Ideally the space should be just reserved, rather than filled with =.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reserve can be called if there is a perf issue. I dont think its that important here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't use std::string as is because the type being unsigned char is important.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could probably use basic_string<unsigned char>, but that's more code changes for what looks like a marginal benefit

std::size_t pad_cond = len % 3;
const size_t last = len - pad_cond;

for(size_t i = 0; i < last; i += 3)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use the iterators directly for the loop:

for(auto it = buf.begin(); it < buf.end(); it += 3)
{
    std::size_t n = to_int(it[0]) << 16u | to_int(it[1]) << 8u | to_int(it[2]);

    ...
}

And add the to_int function to bit_cast it and convert it to std::size_t.

Copy link
Collaborator

@pfultz2 pfultz2 Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it would be better to make a function to encode the triplet since you repeat the same below for the "padding":

template<class Input>
std::array<char, 4> encode(Input input)
{
    std::size_t n = to_int(it[0]) << 16u | to_int(it[1]) << 8u | to_int(it[2]);
    return {b64_chars.at(n >> 18u), b64_chars.at(n >> 12u & 0x3Fu), b64_chars.at(n >> 6u & 0x3Fu), b64_chars.at(n & 0x3Fu) };
}

Then the loop can do:

for(auto it = buf.begin(); it < buf.end(); it += 3)
{
    copy(encode(it), std::back_inserter(result));
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as iterator usage is concerned, lt or gt comparison is not recommended against buf.end(). It should strictly be used by an equality comparison: == or !=.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as iterator usage is concerned, lt or gt comparison is not recommended against buf.end()

Not recommended, by who? These are random access iterators so they support comparison operators just like pointers. The < operator should definitely be used here since we are skipping over by increments of 3, which means it could skip past the end, and it would become an infinite loop since we would never reach the end as we already past it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it could be UB if its past buf.end() so we probably need to do buf.end() - remaining to avoid that.

res_vec.at(j++) = b64_chars.at(pad_cond != 0 ? n >> 10u & 0x3Fu : n >> 2u);
res_vec.at(j++) = b64_chars.at(pad_cond != 0 ? n >> 4u & 0x03Fu : n << 4u & 0x3Fu);
res_vec.at(j++) = pad_cond != 0 ? b64_chars.at(n << 2u & 0x3Fu) : '=';
}
Copy link
Collaborator

@pfultz2 pfultz2 Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can reuse the encode function so it doesnt repeat the code. You can also copy the chars into an array(this way pad_cond is not getting sublty modified):

assert(pad_cond < 3);
std::array<char, 3> triple = {0};
// Get the remaining characters to encode
std::copy(buf.end() - pad_cond, buf.end(), triple.begin());
auto e = encode(triple);
// Add the encoded characters
std::copy(e.begin(), e.begin() + 1 + pad_cond, std::back_inserter(result));
// Pad string with `=`
result.append(3 - pad_cond, '=');

@CharlieL7
Copy link
Collaborator Author

I'm not inclined the rewrite the base64 encode code for a 5th time so I'm going to leave it as is for now. We can improve on it later..

test/base64_test.cpp Outdated Show resolved Hide resolved
test/base64_test.cpp Outdated Show resolved Hide resolved
std::string expected{"AAAA"};
std::string actual{migraphx::b64_encode({input.begin(), input.end()})};
EXPECT(expected == actual);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the variables should be inlined: EXPECT("AAAA" == migraphx::b64_encode({'\x00', '\x00', '\x00' }).

src/netron_output.cpp Outdated Show resolved Hide resolved
@causten causten merged commit 6acc1f9 into develop Dec 20, 2024
40 of 45 checks passed
@causten causten deleted the onnx_json_output branch December 20, 2024 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request roadmap Tasks to finish for a release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants